Welcome to emmeans.
Caution: You lose important information if you filter this package's results.
See '? untidy'
## load data collection exercise data## merged is a a merged long dataset of baseline and dailydm <-as.data.table(read_sav("[2021] PSY4210 merged.sav"))## Remind R which of our variables are factorsdm[, sex :=factor( sex,levels =c(1,2),labels =c("male", "female"))]dm[, relsta :=factor( relsta, levels =c(1,2,3), labels =c("single", "in a committed exclusive relationship", "in a committed nonexclusive relationship"))]
1 LMM Notation
Let’s consider the formula for a relatively simple LMM:
Here as before, the i indicates the ith observation for a specific unit (e.g., person but the unit could also be classrooms, doctors, etc.) and the j indicates the jth unit (in psychology usually person).
Regression coefficients, the \(b\)s with a j subscript indicate a fixed and random effect. That is, the coefficient is allowed to vary across the units, j. As before, these coefficients in practice are decomposed into a fixed and random part:
\[
b_{0j} = \gamma_{00} + u_{0j}
\]
and we estimate in our LMM the fixed effect part, \(\gamma_{00}\), and the variance / standard deviation of the random effect or the covariance matrix if there are multiple random effects, \(\mathbf{G}\):
\[
u_{0j} \sim \mathcal{N}(0, \mathbf{G})
\]
Regression coefficients without any j subscript indicate fixed only effects, effects that do not vary across units, j. These are fixed effects and get estimated directly.
Predictors / explanatory variables, the \(x\)s with an i subscript indicate that the variable varies within a unit. Note that the outcome, \(y\)must vary within units to be used in a LMM.
In this case, the notation tells us the following:
\(y_{ij}\) the outcome variable, which varies both within and between people
\(b_{0j}\) the intercept regression coefficient, which is both a fixed and random effect
\(b_1\) the regression coefficient, slope, for the first predictor, which is a fixed effect only
\(x_{1j}\) the first predictor/explanatory variable, this is a between unit variable only, as the lack of an i subscript indicates it does not vary within units. It could not have a random slope.
\(b_2\) the regression coefficient, slope, for the second predictor, which is a fixed effect only
\(x_{2ij}\) the second predictor/explanatory variable, this variable varies within individuals as shown by its i subscript. It could have a random slope, although in this model, it only has a fixed effect slope.
\(\varepsilon_{ij}\) the residuals, these vary within and between units.
The following decision tree provides some guide to when a predictor / explanatory variable can be a fixed and random effect.
Type of effect decision tree
Let’s see two examples of putting this basic model into practice.
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: dEnergy ~ loneliness + dStress + (dStress | ID)
Data: dm
REML criterion at convergence: 947.5
Scaled residuals:
Min 1Q Median 3Q Max
-2.6247 -0.6572 -0.0176 0.7130 2.5089
Random effects:
Groups Name Variance Std.Dev. Corr
ID (Intercept) 1.3303 1.153
dStress 0.0612 0.247 -0.87
Residual 1.2957 1.138
Number of obs: 283, groups: ID, 88
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 5.3885 0.4364 89.1320 12.35 <2e-16 ***
loneliness -0.3678 0.1692 83.2661 -2.17 0.033 *
dStress -0.1456 0.0678 60.1490 -2.15 0.036 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) lnlnss
loneliness -0.738
dStress -0.531 -0.134
Now with two random effects, we assume that the random effects, \(u_{0j}\) and \(u_{2j}\), which we collectively denote \(\mathbf{u}_{j}\) follow a multivariate normal distribution with covariance matrix \(\mathbf{G}\).
Based on the little decision chart, between unit only variables, like \(loneliness_j\) and \(Bstress_j\)cannot be random effects. In the data collection exercise, we measured loneliness at baseline and also in the daily diary questionnaires. In this example we are using the baseline (trait) loneliness and not the daily one. Also, while it is technically possible for something to only be a random effect without a corresponding fixed effect, its not common and not recommended as it would be equivalent to assuming that the fixed effect, the mean of the distribution, is 0, which is rarely appropriate.
2 Interactions in LMMs
Interactions in LMMs work effectively the same way that interactions in GLMs do, although there are a few nuances in options and possible interpretations. Using the notation from above, let’s consider a few different possible interactions.
2.1 Cross Level (Between and Within Unit) Interactions
First, let’s take our model with loneliness and stress and include an interaction. Here is the model without an interaction.
## long waysummary(lmer(dEnergy ~ loneliness + dStress + loneliness:dStress + (1| ID), data = dm))
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: dEnergy ~ loneliness + dStress + loneliness:dStress + (1 | ID)
Data: dm
REML criterion at convergence: 952.5
Scaled residuals:
Min 1Q Median 3Q Max
-2.6070 -0.6090 -0.0285 0.7734 2.4928
Random effects:
Groups Name Variance Std.Dev.
ID (Intercept) 0.40 0.632
Residual 1.36 1.165
Number of obs: 283, groups: ID, 88
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 6.0085 0.9593 203.5738 6.26 2.2e-09 ***
loneliness -0.6410 0.4681 228.2603 -1.37 0.17
dStress -0.2747 0.2193 228.3756 -1.25 0.21
loneliness:dStress 0.0555 0.1041 247.7271 0.53 0.59
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) lnlnss dStrss
loneliness -0.962
dStress -0.919 0.877
lnlnss:dStr 0.901 -0.929 -0.962
## short hand in R for simple main effect + interaction## identical, but shorter to the abovesummary(lmer(dEnergy ~ loneliness * dStress + (1| ID), data = dm))
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: dEnergy ~ loneliness * dStress + (1 | ID)
Data: dm
REML criterion at convergence: 952.5
Scaled residuals:
Min 1Q Median 3Q Max
-2.6070 -0.6090 -0.0285 0.7734 2.4928
Random effects:
Groups Name Variance Std.Dev.
ID (Intercept) 0.40 0.632
Residual 1.36 1.165
Number of obs: 283, groups: ID, 88
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 6.0085 0.9593 203.5738 6.26 2.2e-09 ***
loneliness -0.6410 0.4681 228.2603 -1.37 0.17
dStress -0.2747 0.2193 228.3756 -1.25 0.21
loneliness:dStress 0.0555 0.1041 247.7271 0.53 0.59
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) lnlnss dStrss
loneliness -0.962
dStress -0.919 0.877
lnlnss:dStr 0.901 -0.929 -0.962
The relevant, new, part is the interaction term, \(b_3\), a fixed effect in this case. If we focus just on that one term, we see that the coefficient, \(b_3\) is applied to the arithmetic product of two variables, here loneliness and stress. As it happens, one of them, loneliness, varies between units whereas the other, stress, varies within units. You will sometimes see this termed as “cross level” interaction between it involves a between and within varying variable.
\[
b_3 * (loneliness_{j} * stress_{ij})
\]
As with interactions for regular GLMs, interactions in LMMs can be interpretted in different ways. The two common interpretations are easiest to see by factoring the regression equation. Here are three equal equations that highlight different ways of viewing the interaction.
In the latter two formats, it highlights how the simple effect of stress varies by loneliness and how the simple effect of loneliness varies by stress.
The nuance in LMMs comes in because some variables vary only between units and others within units. For example, when interpretting the interaction with respect to the simple effect of stress, we could say that the association between daily stress and energy on the same day depends on the loneliness of a participant. Conversely, when interpretting with respect to the simple effect of loneliness, we could say that the association of participant loneliness and average energy depends on how stressed someone is feeling on a given day. Loneliness varies between people, stress varies within people, so that must be taken into account in the interpretation.
2.2 Between Unit Interactions
The same approach would work with other type of variables in LMMs. For example, here we have a model with loneliness and sex as predictors. Both vary only between units.
When interpretting the interaction with respect to the simple effect of sex, we could say that the association between participant sex and average energy depends on the loneliness of a participant. Conversely, when interpretting with respect to the simple effect of loneliness, we could say that the association of participant loneliness and average energy depends on participant’s sex.
When interpretting the interaction with respect to the simple effect of stress, we could say that the association between daily stress and energy on the same day depends on same day self-esteem level. Conversely, when interpretting with respect to the simple effect of self-esteem, we could say that the association of daily self-esteem and same day energy depends on how stressed someone is feeling on a given day.
3 Continuous Interactions in R
Aside from the notes about some minor interpretation differences, in general interactions in LMMs are analysed, graphed, and interpreted the same way as for GLMs.
First to avoid any issues around diagnostics etc. from haven labeled type data, we will convert the variable we are going to work with to numeric. Then we fit a LMM with an interaction between stress and neuroticism, energy as the outcome and a random intercept as the only random effect.
A quick check of the model diagnostics suggests that the data look fairly good. The intercepts do not appear to follow a normal distribution that closely, partly due to the long left tail, but for now we will leave it. If you’re interested in how to fix left skews, please see the bonus material at the end of this markdown (optional).
No extreme values are present. If there were any, we can remove that, as we have discussed in depth in previous lectures, update the model, and re-run diagnostics. In practice it could take a few rounds of extreme value removal or you may decide to stop at one round.
Let’s look at the summary of our model, m. Although we have used summary() a lot in the past, we’ll introduce another function to help look at lmer() model results, modelTest(). In this lecture, we will only learn and interpret part of its output, with the rest of the output from modelTest() covered later. In addition to get nicely formatted results rather than a set of datasets containing the results, we use the APAStyler() function.
APAStyler(modelTest(m))
Warning: 'oldNames' is deprecated. Please use 'signames' instead.
Parameters and CIs are based on REML,
but modelTests requires ML not REML fit for comparisons,
and these are used in effect sizes. Refitting.
Term Est Type
<char> <char> <char>
1: (Intercept) 6.24*** [ 4.93, 7.56] Fixed Effects
2: dStress -0.37* [-0.69, -0.04] Fixed Effects
3: neuroticism -0.49* [-0.88, -0.10] Fixed Effects
4: neuroticism:dStress 0.07 [-0.03, 0.16] Fixed Effects
5: sd_(Intercept)|ID 0.61 Random Effects
6: sigma 1.17 Random Effects
7: Model DF 6 Overall Model
8: N (Groups) ID (88) Overall Model
9: N (Observations) 283 Overall Model
10: logLik -469.35 Overall Model
11: AIC 950.70 Overall Model
12: BIC 972.57 Overall Model
13: Marginal R2 0.09 Overall Model
14: Marginal F2 0.10 Overall Model
15: Conditional R2 0.27 Overall Model
16: Conditional F2 0.37 Overall Model
17: neuroticism (Fixed) 0.04/0.01, p = .015 Effect Sizes
18: dStress (Fixed) 0.02/0.00, p = .027 Effect Sizes
19: neuroticism:dStress (Fixed) 0.01/0.00, p = .160 Effect Sizes
The results show the regression coefficients, asterisks for p-values, and 95% confidence intervals in brackets for the fixed effects, the standard deviations of the random effects, the model degrees of freedom, which is how many parameters were estimated in the model total, and the number of people and observations. For now, we will ignore all the output under row 9, N (Observations). In the case of this model we can see the following.
A LMM was fit with 283 observations from 88 people. There was no significant neuroticism x stress interaction, b [95% CI] = 0.07 [-0.03, 0.16], p = .160.
We can also pass multiple model results in a list together, which puts the results side by side. This is particularly helpful for comparing models with and without covariates, to evaluate whether removing extreme values changed the results substantially, or to compare models with different outcomes.
# Remember: m <- lmer(dEnergy ~ neuroticism * dStress + (1 | ID), data = dm)# Here's a new model with mood as the outcome to compare:mtest1 <-lmer(dMood ~ neuroticism * dStress + (1| ID), data = dm)APAStyler(list(Energy =modelTest(m),Mood =modelTest(mtest1) ))
Warning: 'oldNames' is deprecated. Please use 'signames' instead.
Parameters and CIs are based on REML,
but modelTests requires ML not REML fit for comparisons,
and these are used in effect sizes. Refitting.
Warning: 'oldNames' is deprecated. Please use 'signames' instead.
Parameters and CIs are based on REML,
but modelTests requires ML not REML fit for comparisons,
and these are used in effect sizes. Refitting.
Term Energy Mood
<char> <char> <char>
1: (Intercept) 6.24*** [ 4.93, 7.56] 7.07*** [ 6.05, 8.08]
2: dStress -0.37* [-0.69, -0.04] -0.42** [-0.67, -0.17]
3: neuroticism -0.49* [-0.88, -0.10] -0.17 [-0.47, 0.14]
4: neuroticism:dStress 0.07 [-0.03, 0.16] 0.00 [-0.07, 0.07]
5: sd_(Intercept)|ID 0.61 0.43
6: sigma 1.17 0.93
7: Model DF 6 6
8: N (Groups) ID (88) ID (88)
9: N (Observations) 283 283
10: logLik -469.35 -402.04
11: AIC 950.70 816.09
12: BIC 972.57 837.96
13: Marginal R2 0.09 0.29
14: Marginal F2 0.10 0.41
15: Conditional R2 0.27 0.41
16: Conditional F2 0.37 0.69
17: neuroticism (Fixed) 0.04/0.01, p = .015 0.01/0.01, p = .282
18: dStress (Fixed) 0.02/0.00, p = .027 0.05/0.02, p = .001
19: neuroticism:dStress (Fixed) 0.01/0.00, p = .160 0.00/0.00, p = .969
Type
<char>
1: Fixed Effects
2: Fixed Effects
3: Fixed Effects
4: Fixed Effects
5: Random Effects
6: Random Effects
7: Overall Model
8: Overall Model
9: Overall Model
10: Overall Model
11: Overall Model
12: Overall Model
13: Overall Model
14: Overall Model
15: Overall Model
16: Overall Model
17: Effect Sizes
18: Effect Sizes
19: Effect Sizes
These results show us that we have similar results when predicting daily energy and daily mood from stress and neuroticism. The relationship between daily stress and daily mood, and daily stress and daily energy did not vary by individual differences in neuroticism. In this case, it would make sense to re-run these models without the interaction term to test the main effects of daily stress and neuroticism.
Warning: 'oldNames' is deprecated. Please use 'signames' instead.
Parameters and CIs are based on REML,
but modelTests requires ML not REML fit for comparisons,
and these are used in effect sizes. Refitting.
Warning: 'oldNames' is deprecated. Please use 'signames' instead.
Parameters and CIs are based on REML,
but modelTests requires ML not REML fit for comparisons,
and these are used in effect sizes. Refitting.
Term Energy Mood
<char> <char> <char>
1: (Intercept) 5.46*** [ 4.74, 6.17] 7.04*** [ 6.50, 7.59]
2: dStress -0.15* [-0.27, -0.03] -0.41*** [-0.51, -0.32]
3: neuroticism -0.24* [-0.42, -0.06] -0.16* [-0.30, -0.03]
4: sd_(Intercept)|ID 0.61 0.43
5: sigma 1.17 0.93
6: Model DF 5 5
7: N (Groups) ID (88) ID (88)
8: N (Observations) 283 283
9: logLik -470.33 -402.04
10: AIC 950.67 814.09
11: BIC 968.90 832.31
12: Marginal R2 0.08 0.29
13: Marginal F2 0.09 0.41
14: Conditional R2 0.27 0.41
15: Conditional F2 0.37 0.69
16: neuroticism (Fixed) 0.05/0.00, p = .009 0.04/0.02, p = .020
17: dStress (Fixed) 0.02/0.01, p = .013 0.30/0.16, p < .001
Type
<char>
1: Fixed Effects
2: Fixed Effects
3: Fixed Effects
4: Random Effects
5: Random Effects
6: Overall Model
7: Overall Model
8: Overall Model
9: Overall Model
10: Overall Model
11: Overall Model
12: Overall Model
13: Overall Model
14: Overall Model
15: Overall Model
16: Effect Sizes
17: Effect Sizes
Results are indeed similar for daily mood and energy as outcomes. There are significant negative associations between daily stress and both outcome variables, and also neuroticism and both outcome variables.
3.1 Plotting
Typically, to plot our significant interaction, a few exemplar lines are graphed showing the slope of one variable with the outcome at different values of the moderator. As with GLMs, we can use the visreg() function. Here, we’ll use neuroticism as the moderator. A common approach to picking level of the moderator is to use the Mean - 1 SD and Mean + 1 SD. To do that, we first need the mean and standard deviation of neuroticism, which we can get using egltable() after excluding duplicates by ID, since neuroticism only varies between units. Note that we are doing this for the sake of example only since our interaction was not signficant.
egltable(c("neuroticism"), data = dm[!duplicated(ID)])
The results show, in an easier to interpret way, what the positive interaction coefficient of \(b = 0.07\) means, people with higher levels of neuroticism are less sensitive to the (negative) effects of stress. People higher in neuroticism are relatively less sensitive to the effects of stress, although in both cases, higher stress is associated with lower energy levels. Keep in mind again that we are interpreting this as though the interaction was signficant for the sake of example only.
Another common way of picking some exemplar values is to use the 25th and 75th percentiles. These work particularly well for very skewed distributions where the mean +/- SD could be outside the observed range of the data. Again we exclude duplicates by ID and then use the quantile() function to get the values, 3, and 4.5 for the 25th and 75th percentiles.
When working with models that have interactions, a common aid to interpretation is to test the simple effects / slopes from the model. For example, previously we graphed the association between stress and energy at M - 1 SD and M + 1 SD on neuroticism. However, although visually both lines appeared to have a negative slope, we do not know from the graph alone whether there is a significant association between stress and energy at both the low (M - 1 SD) and high (M + 1 SD) levels of neuroticism. To answer that, we need to test the simple slope of stress at specific values of neuroticism. Again this is for demonstration only given our non- significant moderation. We would not need to compute simple slopes or effects for non-significant moderations in reality.
Our default model does actually give us one simple slope: it is the simple slope of stress when \(neuroticism = 0\). However, as we can tell from the mean and standard deviation of neuroticism, 0 is very far outside the plausible range of values so that simple slope given to us by default from the model is not too useful. We could either center neuroticism and re-run the model, which would get us a different simple slope, or use post hoc functions to calculate simple slopes.
We will use the emtrends() function from the emmeans package to test the simple slopes. This function also works with GLMs, for your reference.
The emtrends() function take a model as its first argument, then the variable that you want to calculate a simple slope for, here stress, the argument at requires a list of specific values of the moderator, and then we tell it how we want degrees of freedom calculated (note this only applies to lmer models). We store the results in an R object, mem and then call summary() to get a summary table. The infer = TRUE argument is needed in summary() if you want p-values.
mem <-emtrends(m, var ="dStress",at =list(neuroticism =c(3.46-1.11, 3.46+1.11)),lmer.df ="satterthwaite")summary(mem, infer=TRUE)
The relevant parts of the output, for us, are the columns for stress.trend which are the simple slopes, the values of neuroticism which tell us at what values of neuroticism we have calculated simple slopes, the confidence intervals, lower.CL and upper.CL, 95% by default, and the p-value. From these results, we can see that when \(neuroticism = 2.35\) there is a significant negative association between stress and energy, but not when \(neuroticism = 4.57\).
3.3 Sample Write Up
With all of this information, we can plan out some final steps for a polished write up of the results. First, let’s get exact p-values for all our results. We can do this through options to pcontrol in APAStyler(). We also re-print the simple slopes here.
Now we will make a polished, finalized figure. I have customized the colours, and turned off the legends. In place of legends, I have manually added text annotations including the simple slopes and confidence intervals and p-values for the simple slopes1.
visreg(m, xvar ="dStress",by ="neuroticism", overlay =TRUE,breaks =c(3.46-1.11, 3.46+1.11),partial =FALSE, rug =FALSE, gg =TRUE,xlab ="Daily Stress",ylab ="Predicted Daily Energy") +scale_color_manual(values =c("2.35"="black", "4.57"="grey70")) +theme_pubr() +guides(colour ="none", fill ="none") +annotate(geom ="text", x =3.2, y =3.9, label ="High Neuroticism: b = -0.07 [-0.23, 0.10], p = .447",angle =-9) +annotate(geom ="text", x =4, y =4.4, label ="Low Neuroticism: b = -0.21 [-0.36, -0.06], p = .005",angle =-22)
Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
ℹ Please use tidy evaluation idioms with `aes()`.
ℹ See also `vignette("ggplot2-in-packages")` for more information.
ℹ The deprecated feature was likely used in the visreg package.
Please report the issue at <https://github.com/pbreheny/visreg/issues>.
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
ℹ The deprecated feature was likely used in the ggplot2 package.
Please report the issue at <https://github.com/tidyverse/ggplot2/issues>.
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.
A linear mixed model using restricted maximum likelihood was used to test whether the association of daily stress on daily energy is moderated by baseline neuroticism scores. All predictors were included as fixed effects and a random intercept by participant was included. Visual diagnostics showed that energy was normally distributed, and no outliers were present.
The daily stress x neuroticism interaction was not statistically significant, which indicated that the relationship between stress and energy did not vary by neuroticism. Results from the analysis with the interaction dropped revealed that both neuroticism and daily stress were negatively associated with daily energy.
4 Continuous x Categorical Interactions in R
Continuous x Categorical interactions are conducted much as continuous x continuous interactions. Typically with continuous x categorical interactions, simple slopes for the continuous variable are calculated at all levels of the categorical variable.
Let’s illustrate this with a model examining the relationship between daily energy levels and self-esteem with sex as a moderator.
mconcat <-lmer(dSE ~ dEnergy*sex + (1| ID), data = dm)
The model diagnostics look relatively good, albeit not perfect.
With reasonable diagnostics, we can look at a summary. There is one extreme residual but I’m choosing to leave it in the dataset.
summary(mconcat)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: dSE ~ dEnergy * sex + (1 | ID)
Data: dm
REML criterion at convergence: 771.8
Scaled residuals:
Min 1Q Median 3Q Max
-3.0275 -0.5344 0.0658 0.5515 2.6785
Random effects:
Groups Name Variance Std.Dev.
ID (Intercept) 0.489 0.699
Residual 0.607 0.779
Number of obs: 282, groups: ID, 88
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 2.235 0.564 269.175 3.96 9.4e-05 ***
dEnergy 0.619 0.120 273.898 5.16 4.9e-07 ***
sexfemale 0.878 0.600 268.799 1.46 0.144
dEnergy:sexfemale -0.274 0.128 272.789 -2.13 0.034 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) dEnrgy sexfml
dEnergy -0.907
sexfemale -0.940 0.853
dEnrgy:sxfm 0.850 -0.937 -0.904
Factor variables in interactions do not work currently with modelTest(), so if we wanted to use it, we’d need to manually dummy code the categorical variable. The results are identical.
Warning: 'oldNames' is deprecated. Please use 'signames' instead.
Parameters and CIs are based on REML,
but modelTests requires ML not REML fit for comparisons,
and these are used in effect sizes. Refitting.
Term Est Type
<char> <char> <char>
1: (Intercept) 2.23*** [ 1.13, 3.34] Fixed Effects
2: dEnergy 0.62*** [ 0.38, 0.85] Fixed Effects
3: dEnergy:female -0.27* [-0.53, -0.02] Fixed Effects
4: female 0.88 [-0.30, 2.05] Fixed Effects
5: sd_(Intercept)|ID 0.70 Random Effects
6: sigma 0.78 Random Effects
7: Model DF 6 Overall Model
8: N (Groups) ID (88) Overall Model
9: N (Observations) 282 Overall Model
10: logLik -380.57 Overall Model
11: AIC 773.14 Overall Model
12: BIC 794.99 Overall Model
13: Marginal R2 0.23 Overall Model
14: Marginal F2 0.29 Overall Model
15: Conditional R2 0.56 Overall Model
16: Conditional F2 1.30 Overall Model
17: dEnergy (Fixed) 0.09/0.15, p < .001 Effect Sizes
18: female (Fixed) 0.00/0.01, p = .144 Effect Sizes
19: dEnergy:female (Fixed) 0.02/0.05, p = .035 Effect Sizes
4.1 Plotting
With continuous x categorical interactions, the easiest approach is to plot the simple slope of the continuous variable by the categorical one as shown in the following.
When working with models that have interactions, a common aid to interpretation is to test the simple effects / slopes from the model. For example, previously we graphed the association between daily energy and self-esteem at each level of the categorical sex variable, i.e. for men and women. However, we cannot tell from the graph whether daily energy is significantly associated with self-esteem for men or women.
To answer that, we need to test the simple slope of energy at the two sex levels. Our default model does actually give us one simple slope: it is the simple slope of energy when for men (i.e when female = 0), but we might want more.
We will use the emtrends() function from the emmeans package to test the simple slopes.
The emtrends() function take a model as its first argument, then the variable that you want to calculate a simple slope for, here energy, the argument at requires a list of specific values of the moderator, and then we tell it how we want degrees of freedom calculated (note this only applies to lmer models). We store the results in an R object, mem and then call summary() to get a summary table. The infer = TRUE argument is needed in summary() if you want p-values.
mem <-emtrends(mconcat, var ="dEnergy",at =list(sex =c("male", "female")),lmer.df ="satterthwaite")summary(mem, infer=TRUE)
The relevant parts of the output, for us, are the columns for dEnergy.trend which are the simple slopes, the values of sex which tell us at what values of sex we have calculated simple slopes, the confidence intervals, lower.CL and upper.CL, 95% by default, and the p-value. From these results, we can see that daily energy is significantly associated with self-esteem for any sex, although it is stronger for male participants than female participants.
5 Categorical x Categorical Interactions in R
Categorical x Categorical interactions are conducted comparably, although more contrasts / simple effect follow-ups are possible.
Here we will work with Int_Str again, which is a two-level categorical predictor (0 = no interaction with a stranger, 1 = interacted with a stranger that day) and energy as the outcome. We also work with a three-level conscientiousness variable.
With reasonable diagnostics, we can look at a summary.
summary(mcat2)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: dEnergy ~ cons3 * Int_Str + (1 | ID)
Data: dm
REML criterion at convergence: 943.9
Scaled residuals:
Min 1Q Median 3Q Max
-2.6025 -0.6516 0.0136 0.7113 2.2028
Random effects:
Groups Name Variance Std.Dev.
ID (Intercept) 0.464 0.681
Residual 1.315 1.147
Number of obs: 283, groups: ID, 88
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 3.6934 0.1635 118.5695 22.59 < 2e-16 ***
cons3Mid -0.0514 0.3064 97.6260 -0.17 0.86716
cons3High 0.6270 0.2645 119.6594 2.37 0.01935 *
Int_Str 0.7898 0.2317 266.3274 3.41 0.00075 ***
cons3Mid:Int_Str -0.9873 0.5050 272.8990 -1.96 0.05157 .
cons3High:Int_Str -0.6233 0.3791 276.2360 -1.64 0.10128
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) cns3Md cns3Hg Int_St c3M:I_
cons3Mid -0.534
cons3High -0.618 0.330
Int_Str -0.446 0.238 0.276
cns3Md:In_S 0.205 -0.375 -0.127 -0.459
cns3Hgh:I_S 0.273 -0.146 -0.462 -0.611 0.280
5.1 Plotting
With categorical x categorical interactions, visreg() produces OK but not great figures as shown in the following. We can see the means of energy for all 6 cells (the cross of 3 level of conscientiousness x 2 levels of Int_Str).
dm[, Int_Str :=factor( Int_Str,levels =c(0,1),labels =c("No interaction with stranger", "Interacted with stranger"))]visreg(mcat2, xvar ="Int_Str",by ="cons3", overlay=TRUE,partial =FALSE, rug =FALSE)
5.2 Simple Effects
When working with two categorical interactions (or with a categorical predictor with >2 levels where you want to test various group differences), the emmeans() function from the emmeans package is helpful. We can get the means of interaction with stranger by conscientiousness group and get confidence intervals and p-values. These p-values test whether each mean is different from zero, by default.
## re-run mcat2 now with int_str as factormcat2 <-lmer(dEnergy ~ cons3 * Int_Str + (1| ID), data = dm)em <-emmeans(mcat2, "Int_Str", by ="cons3",lmer.df ="satterthwaite")summary(em, infer =TRUE)
cons3 = Low:
Int_Str emmean SE df lower.CL upper.CL t.ratio
No interaction with stranger 3.69 0.164 118.6 3.37 4.02 22.586
Interacted with stranger 4.48 0.216 212.4 4.06 4.91 20.770
p.value
<0.0001
<0.0001
cons3 = Mid:
Int_Str emmean SE df lower.CL upper.CL t.ratio
No interaction with stranger 3.64 0.259 90.5 3.13 4.16 14.058
Interacted with stranger 3.44 0.432 245.6 2.59 4.29 7.981
p.value
<0.0001
<0.0001
cons3 = High:
Int_Str emmean SE df lower.CL upper.CL t.ratio
No interaction with stranger 4.32 0.208 120.3 3.91 4.73 20.784
Interacted with stranger 4.49 0.273 183.0 3.95 5.03 16.446
p.value
<0.0001
<0.0001
Degrees-of-freedom method: satterthwaite
Confidence level used: 0.95
A nice plot, with confidence intervals for the fixed effects, can be obtained by using the emmip() function from the emmeans package. It takes as input the results from emmeans(), not the lmer() model results directly. Here is a simple plot showing the categorical interactions. Note that with this approach, you could basically fit the same model(s) that you would with a repeated measures or mixed effects ANOVA model, with the advantage that LMMs do not require balanced designs and allow both categorical and continuous predictors (e.g., you could include continuous covariates easily). GLMs and (G)LMMs can do everything that t-tests and various ANOVAs can, but with greater flexibility.
For many statistical models, including LMMs, it can be informative to compare different models. Comparing different models can be used in lots of different ways. Here are some examples, although they are not meant to be exhaustive.
Evaluate which of two (or more) models provides the best fit to the data
Evaluate / compare how the results for a particular predictor(s) of interest change across two (or more) models
Calculate the significance of multiple predictors
Calculate effect sizes for one or multiple predictors
We will look at examples of the different uses of model comparisons in this topic.
Just as there are many different kinds of models we can fit, even with LMMs (e.g., with or without random slopes, etc.), so to there are many different kinds and purposes for different model comparisons.
To begin with, let’s imagine we are just trying to compare two models: Model A and Model B. We can broadly classify the type of comparison based on whether Model A and B are nested or non-nested models. We will talk about what these mean next.
6.1 Nested Models
Models are considered nested when one model is a restricted or constrained version of another model. For example, suppose that Model A predicts mood from stress and loneliness whereas Model B predicts mood from loneliness only. Written as a formula, these could be:
In Model B, I purposely used \(0 * stress_{ij}\) to highlight that when a predictor is left out of a model, it is the same as fixing (sometimes also called constraining) the coefficient (\(b2\) in this case) to be exactly 0. In this case, we would say that Model B is nested within Model A. In other words, Model A contains every predictor and parameter in Model B plus more.
Restricted
This idea is similar to the idea of nested data used in LMMs, but the difference is that we are not talking about observations or data, rather we are talking about the parameters of a model.
To summarize, briefly, we say that Model B is nested within Model A if:
setting one or more parameters in Model A to 0 yields the same model as Model B.
both models use the same data, have the same number of observations, etc. Even if the same dataset is used in R, this does not guarantee the same data are used because if additional predictors are included in Model A and these extra predictors have some missing data, by default R would drop the missing data and result in a subset of cases being used for Model A than for Model B
If two models are nested, then we have the most options in terms of comparing the two models. For example, we can evaluate whether Model A is a statistically significantly better fit than is Model B using a Likelihood Ratio Test (LRT)2.
We can compare the fit of each model and use the difference in fit to derive effect sizes. We also can attribute any differences in the two models to the parameter(s) that have been constrainted to 0 in Model B from Model A.
A simple definition of the LRT test statistic, \(\lambda\) is based on two times the difference in the log likelihoods.
\[
\lambda = -2 * (LL_B - LL_A)
\]
You may wonder why the likelihood ratio test can be conducted by taking the difference in the log likelihoods. It is because the log of a ratio is the same as the difference in the logs of the numerator and denominator.
If the null hypothesis of no difference is true in the population, then \(\lambda\) will follow a chi-squared distribution with degrees of freedom equal to the number of parameters constrained to 0 in Model B from Model A, the difference in degrees of freedom used for each model, that is:
\[
\lambda \sim \chi^2(DF_A - DF_B)
\]
Thus we often use a chi-square distribution in the LRT to look up the p-value. Finally, note that because LRTs are based on the log likelihoods (LL) from a model, we need true log likelihoods for the LRT to be valid. Therefore, we cannot use restricted maximum likelihood, we need to use maximum likelihood estimation.
6.1.1 Nested Models in R
To see the idea of nested models and the LRT in action, let’s examine a concrete example in R. Here are two LMMs corresponding to Model A and Model B formula we wrote previously. We can see in the R code that the models are nested, the only difference is stress. We set REML = FALSE to get maximum likelihood estimates so that we get true log likelihoods. We also need to confirm that the two models are based on the same number of observations. We can extract just this information in R using the nobs() function. This lets us confirm that the two models are fitted to the same data. For example, if stress had some missing data that was not missing loneliness or mood, it could be that Model A is based on fewer observations than Model B.
modela <-lmer(dMood ~ loneliness + dStress + (1| ID), data = dm, REML =FALSE)modelb <-lmer(dMood ~ loneliness + (1| ID), data = dm, REML =FALSE)nobs(modela)
[1] 283
nobs(modelb)
[1] 283
In this case, we can see that the number of observations are identical. Now we can find the log likelihoods of both models by using the logLik() function.
logLik(modela)
'log Lik.' -396.8 (df=5)
logLik(modelb)
'log Lik.' -432.8 (df=4)
Now we have the log likelihoods (LL) of each model and the degrees of freedom from both models. From this, we can calculate \(\lambda\) and then look up the p-value for a chi-square distribution with \(\lambda\) and 1 degree of freedom (1 is the difference in degrees of freedom between Model A and B). To get the p-value from a chi-square, we use the pchisq() function.
## lambda-2* (-432.79--396.81)
[1] 71.96
## p-value from a chi-squarepchisq(71.96, df =1, lower.tail =FALSE)
[1] 2.196e-17
In practice, we do not need to do these steps by hand, we can get a test of two nested models in R using the anova() function (which in this case is not actually analyzing the variance really). We use the anova() function with test = "LRT" to get a likelihood ratio test.
If you compare the chi-square value, degrees of freedom and p-value, you’ll see that they basically match what we calculated by hand. The small differences are due to rounding (R will use more decimals of precision whereas we only used two decimals).
In this simple case, we are only testing a single parameter, the fixed regression coefficient for stress, because that is the only parameter that differs between Model A and Model B. Thus in this case, it would be easier to rely on the t-test we get from a summary of the model.
summary(modela)
Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
method [lmerModLmerTest]
Formula: dMood ~ loneliness + dStress + (1 | ID)
Data: dm
AIC BIC logLik -2*log(L) df.resid
803.6 821.8 -396.8 793.6 278
Scaled residuals:
Min 1Q Median 3Q Max
-2.935 -0.665 0.145 0.656 2.163
Random effects:
Groups Name Variance Std.Dev.
ID (Intercept) 0.106 0.326
Residual 0.875 0.936
Number of obs: 283, groups: ID, 88
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 7.5359 0.2862 93.3199 26.33 < 2e-16 ***
loneliness -0.4935 0.1167 73.2663 -4.23 6.7e-05 ***
dStress -0.4132 0.0446 241.5678 -9.26 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) lnlnss
loneliness -0.740
dStress -0.472 -0.197
In this instance, the LRT and the t-test actually yield equivalent p-values, 2e-16, however, this does not have to be. The LRT is based on slightly different theory than the t-test and the t-test uses approximate degrees of freedom for the t-distribution based on the Satterthwaite method, whereas the LRT does not directly incorporate the sample size in the same way. The two methods will be assymptotically equivalent (at very large sample sizes) but can differ, particularly for smaller samples.
In practice, we wouldn’t usually use a LRT to evaluate whether a single model parameter is statistically significant. The benefit of (nested) model comparisons is that it allows us to compare two models. Those models can be quite different.
Here are another two models, this time they differ by two predictors, energy and sex. The LRT now tests whether the addition of energy and sex results in significantly better fit for Model A than Model B.
modela <-lmer(dMood ~ loneliness + stress + dEnergy + sex + (1| ID),data = dm, REML =FALSE)modelb <-lmer(dMood ~ loneliness + stress + (1| ID), data = dm,REML =FALSE)nobs(modela)
In this case, we can see from the significant p-value that Model A is a significantly better fit to the data than is Model B. Note that LRTs are only appropriate when the models are nested. We cannot use LRTs for non-nested models.
In nested models, the more complex model is often called the “Full” model and the simpler model the “Reduced” or “Restricted” version of the Full model.
6.2 Non Nested Models
Models are considered non nested when one model is not strictly a constrained version of a more complex model. For example, suppose that Model A predicts mood from loneliness and stress whereas Model B predicts mood from loneliness and sex. Written as a formula, these could be:
Although Model B does have loneliness which also is present in Model A and it has restricted stress to be 0, it has another addition, sex, which is not in Model A. In this case, the two models are not nested. In this case, although we could fit both models and they are both on the same number of observations, so the outcome is the same in both models, the models are not nested. Although we can still ask R to conduct a LRT, this LRT is not valid. It is shown here to highlight that you as the analyst are responsible for determining whether your two models are nested or not and therefore deciding whether a LRT is an appropriate way to evaluate whether the models are significantly different from each other, or not.
modela <-lmer(dMood ~ loneliness + dStress + (1| ID),data = dm, REML =FALSE)modelb <-lmer(dMood ~ loneliness + sex + (1| ID), data = dm,REML =FALSE)nobs(modela)
[1] 283
nobs(modelb)
[1] 283
## this is NOT appropriate LRTanova(modela, modelb, test ="LRT")
Although we cannot conduct a LRT on non nested models, it is still useful to compare the fit of non-nested models. For example, if one is a much better fit than another model, that may suggest one set of predictors is superior or can be used to evaluate competing hypotheses (e.g., Theory 1 says that stress and family history are the most important predictors of mood and Theory 2 says that age and sleep are the best predictors of mood — these two theories are competing, not nested versions of each other).
We cannot use LRTs to compare non nested models, but we can use other measures, including performance measures such as variance explained or model accuracy and information criterion. We will talk about information criterion next.
BIC: the Bayesian Information Criterion (BIC)4 also sometimes called the Schwarz Information Criterion
Both the AIC and BIC are calculated based primarily on the log likelihood, LL, of a model. One way of thinking about these information criterion is that you could think about models being a way of approximating reality. Suppose you have two models that both generate approximations (predictions) of reality. The model whose predictions are closer to the observed data will have a higher LL. The LL can be used as a relative measure of model fit. LL is not an absolute measure of fit. That is, we do not interpret a specific LL value as indicating “good” fit. Only which of a set of (all potentially bad) models is the best.
However, there is a limitation with using LL alone. The LL will always stay the same or increase as additional predictors / parameters are added to the model. Thus if we use the LL alone, out of a set of competing models, we would virtually always pick the more complex models. To address this, we need to incorporate some penalty for the complexity of a model, so that to choose a more complex over simpler model as the “best” it has to improve the LL enough.
The AIC and BIC are very similar except that they use different penalties for model complexity (technically they are derived from rather different theoretical foundations, but for practical purposes they are similar other than the complexity penalties).
A common way of defining model complexity is based on the number of estimated model parameters, \(k\). For example, consider the following simple linear regression for \(n\) different observations:
\(n\) is the number of observations included in the model and \(LL\) is the log likelihood, where higher values indicate a better fitting model. These equations highlight that the only difference between the AIC and BIC is whether the number of parameters in the model, \(k\) are multiplied by \(2\) (AIC) or \(log_{e}(n)\). Thus, the AIC and BIC will be identical if:
If \(n < e^2 \approx 7.39\) then the BIC will have a weaker penalty based on the number of parameters, \(k\), than the AIC. If \(n > e^2 \approx 7.39\) then the BIC will have a stronger penalty based on the number of parameters, \(k\), than the AIC. Functionally, a stronger penalty on the number of parameters means that a particular information criterion will tend to favor more parsimonious (less complex) models. Thus, for all but the tiniest of sample sizes (\(\approx 7.39\)) the BIC will have a stronger penalty on model complexity and so will favor relatively more parsimonious models than will AIC.
For both AIC and BIC, the relatively better model of those compared is the model with the lower value (i.e., lower = better for both AIC and BIC).
There is no “right” or “wrong” information criterion to use. People use both the AIC and/or the BIC. If both AIC and BIC suggest the same model is the “best” there is no ambiguity. Sometimes the AIC and BIC disagree regarding which model is the best. In these cases one must pick which information criterion to go with. Both criterion require that the number of observations, \(n\), be larger than the number of parameters, \(k\) for them to operate well.
A benefit of the AIC and BIC is that both can be used to compare non nested models and they also can be used to compare nested models. It is relatively common to use AIC and/or BIC to “choose” amongst a set of possible models.
In R we can usually calculate AIC and BIC using the functions, AIC() and BIC(). Here we will calculate the AIC and BIC for our two models.
AIC(modela, modelb)
df AIC
modela 5 803.6
modelb 5 875.6
BIC(modela, modelb)
df BIC
modela 5 821.8
modelb 5 893.8
In this case, both the AIC and BIC are lower for Model A than for Model B, indicating that Model A is the optimal of those two models. Again, the relative interpretation is important. We cannot conclude that Model A is a “good” model, only that it is better than Model B.
6.4 Model Selection
Integrating what we have learned about LRT for nested models and AIC/BIC for nested or non nested models, we can compare across several models to select the best model.
First, we fit a series of models. In all cases, mood is the outcome. First we have an intercept only model (m0), an unconditional model, as a baseline reference. This is helpful as all the fit indices are relative. Then we fit polynomials of stress with degree 1 (linear; m1); degree 2 (quadratic; m2); degree 3 (cubic; m3).
We fit different degree polynomials of stress using the poly() function in R. Finally we fit a competing model, maybe a linear effect of energy is a better predictor of mood than stress.
Next, we check that all the observations are identical across models. If the observations were not the same across models, we would need to create a new dataset that had any missing data on any variable used in any of the models excluded. This is a critical step if needed, because LRTs and AIC and BIC are only valid if based on the same data.
dm[, dStress :=as.numeric(dStress)]dm[, dMood :=as.numeric(dMood)]m0 <-lmer(dMood ~1+ (1| ID), data = dm, REML =FALSE)m1 <-lmer(dMood ~poly(dStress, 1) + (1| ID), data = dm, REML =FALSE)m2 <-lmer(dMood ~poly(dStress, 2) + (1| ID), data = dm, REML =FALSE)m3 <-lmer(dMood ~poly(dStress, 3) + (1| ID), data = dm, REML =FALSE)malt <-lmer(dMood ~ dEnergy + (1| ID), data = dm, REML =FALSE)## check all the observations are the samenobs(m0)
[1] 283
nobs(m1)
[1] 283
nobs(m2)
[1] 283
nobs(m3)
[1] 283
nobs(malt)
[1] 283
Now we can see which model is the best fit. For the nested models (m0 - m3) we can use LRTs. Technically, m0 also is nested in malt, so we can use a LRT for that too. We cannot use a LRT for, say, m1 and malt as those models are not nested. We can use AIC and BIC for all models, though. Note that with multiple models, the anova() function is doing sequential LRT tests (e.g., m0 vs m1; m1 vs m2, etc.) not all compared to one model. If you want two specific models compared, specify just two.
#### LRTs for nested models ###### two specific comparisonsanova(m0, m3, test ="LRT")
From the LRT we can see that m3 is significantly better fit than m0. Looking at the sequential tests, however, m2 is no better than m1 and m3 is no better than m2, which might suggest m1 is the best model. If we look at the AIC values, for the models with stress (m1,m2,m3), AIC selects m1 as the best model (by 1.45 compared to m2, very close). BIC also selects m1 over m2. However, both AIC and BIC indicate that the alternate model (malt) is the best of all the models evaluated, suggesting that linear energy is a better predictor of mood than is linear, quadratic, or cubic polynomials of stress.
If we were picking a stress model only, we would pick the linear (m1) model, because it is best on all indices (LRT, AIC, BIC). If we were willing to pick energy over stress, we would clearly go with energy.
In addition to testing fixed effects, these same methods can be used to test random effects. Here we will fit a new model, m4, that includes a random slope of stress and the correlation between the random intercept and stress slope. Then we compare the linear stress with random slope (m4), linear stress without random slope (m1) and intercept only model (m0). Please note that m4 did not converge, so normally we would simplify our model but we don’t want to remove the random slope of stress for demonstration purposes - so in this case we will have to remember not to “trust” our results for m4.
Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model failed to converge with max|grad| = 0.00337365 (tol = 0.002, component 1)
See ?lme4::convergence and ?lme4::troubleshooting.
## check all the observations are the samenobs(m0)
From these tests we can conclude several things. Firstly adding a linear trend of stress as a fixed effect is significantly better than the intercept only model. Secondly (if our model had converged), adding a random slope of stress and correlation of the random slope and intercept is also significantly better than the fixed linear effect of stress model. Thirdly, the model with stress as both a fixed and random slope is significantly better than the intercept only model (m4 vs m0 is kind of an omnibus test of the effect of adding stress as both a fixed and random slope into the model).
Finally, AIC favours m4 and and BIC favours m1 over the other models, and since m4 did not converge, suggests that m1 is the best balance of complexity and model fit (and certainly better than m0).
The main point of this final exercise is to show how you can use LRTs and AIC and BIC to evaluate whether random effects “help” a model fit.
7 Summary
7.1 Conceptual
Key points to take away conceptually are:
How to interpret standard notation in LMM equations, including with interaction terms
What the different types of LMM interactions are
How to include interactions/moderation in LMMs in R
How to understand whether there is a significant interaction
How to test and interpret interactions
How to test simple slopes / simple effects from different kinds of interactions
How comparing models is a powerful and flexible way to test many different hypotheses
The differences between nested and non nested models, as well as what sorts of comparisons are valid for each
What they are and how to interpret LRT, AIC, BIC
How to select the “best” model empirically using model comparison techniques.
7.2 Code
Function
What it does
lmer()
estimate a LMM
confint()
calculate confidence intervals for a LMM
visreg()
create marginal or conditional graphs from a LMM
modelDiagnostics()
evaluate model diagnostics for LMMs including of multivariate normality
summary()
get a summary of model results
modelTest()
along with APAStyler() get a nicely formatted summary of a model results.
emmeans()
test specific means from a model.
emtrends()
test simple slopes from a model.
AIC()
calculates AIC for LMMs
BIC()
calculates BIC for LMMs
anova()
can be used to compare two nested LMMs and calculate a LRT
modelTest()
along with APAStyler() get a nicely formatted summary of a model results, including many automatic effect sizes.
poly()
Fit a polynomial of specified degree for a predictor in a LMM.
7.3 Extra (in case you ever want to fix left skews - optional material)
Applying transformations to left skewed data is more difficult as generally transformations work on long right tails. A solution is to reverse the variable, apply the transformation and then again reverse it so that the direction is the same as it originally was. We could try a square root transformations which is milder than a log transformation. To reverse it, we subtract the variable from the sum of its minimum and maximum. Next we take its square root, then we reverse by again subtracting from the sum of its minimum and maximum, but square root transformed.
Let’s sidetrack a little and try this out with an example using a slightly skewed outcome variable, dMood.
## First let's see how the model looks like with original dMood scoresmtest1 <-lmer(dMood ~ neuroticism * dStress + (1| ID), data = dm)mt1 <-modelDiagnostics(mtest1) plot(mt1, nrow =2, ncol =2, ask =FALSE)
## Now let's try to fix the very mild left skew just for the sake## of demonstrationmax(dm$dMood) +min(dm$dMood)
The transformation appears to have modestly helped the distribution of residuals. Its not that clear whether it was bad enough to begin with and whether the transformation improved it enough that it is worth the difficulty in interpretation (dMood is now square root transformed and that must be incorporated into its interpretation). For the lecture, we did this for demonstration purposes, but in practice, consider whether this is worth it or only adds difficulty in understanding without improving / changing results much.
Footnotes
For your reference, it took about 8 trial and errors of different x and y values and angles to get the text to line up about right. I did not magically get the values to use to get a graph that I thought looked nice. That is why I think sometimes it is easier to add this sort of text after the fact in your slides or papers rather than building it into the code.↩︎